A dynamic cache sub-block design to reduce false sharing

نویسندگان

  • Murali Kadiyala
  • Laxmi N. Bhuyan
چکیده

{ Parallel applications suuer from signiicant bus traac due to the transfer of shared data. Large block sizes exploit locality and decrease the eeective memory access time. It also has a tendency to group data together even though only a part of it is needed by any one processor. This is known as the false sharing problem. This research presents a dynamic sub-block coherence protocol which minimizes false sharing by trying to dynamically locate the point of false reference. Sharing traac is minimized by maintaining coherence on smaller blocks (sub-blocks) which are truly shared, whereas larger blocks are used as the basic units of transfer. Larger blocks exploit locality while coherence is maintained on sub-blocks which minimize bus traac due to shared misses. The simulation results indicate that the dynamic sub-block protocol reduces the false sharing misses by 20 to 40 percent over the xed sub-block scheme. Abstract Parallel applications suuer from signiicant bus traac due to the transfer of shared data. Large block sizes exploit locality and decrease the eeective memory access time. It also has a tendency to group data together even though only a part of it is needed by any one processor. This is known as the false sharing problem. This research presents a dynamic sub-block coherence protocol which minimizes false sharing by trying to dynamically locate the point of false reference. Sharing traac is minimized by maintaining coherence on smaller blocks (sub-blocks) which are truly shared, whereas larger blocks are used as the basic units of transfer. Larger blocks exploit locality while coherence is maintained on sub-blocks which minimize bus traac due to shared misses. The simulation results indicate that the dynamic sub-block protocol reduces the false sharing misses by 20 to 30 percent over the xed sub-block scheme.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconciling Sharing and Spatial Locality Using Adjustable Block Size Coherent Caches

Several studies have shown that the performance of coherent caches depends on the relationship between the cache block size and the granularity of sharing and locality exhibited by the program. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increas...

متن کامل

Efficient Resource Oblivious Algorithms for Multicores

We consider the design of efficient algorithms for a multicore computing environment with a global shared memory and p cores, each having a cache of size M , and with data organized in blocks of size B. We characterize the class of ‘Hierarchical Balanced Parallel (HBP)’ multithreaded computations for multicores. HBP computations are similar to the hierarchical divide & conquer algorithms consid...

متن کامل

False Sharing and Spatial Locality in Multiprocessor Caches

The performance of the data cache in shared-memory multiprocessors has been shown to be diierent from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can signiicantly limit the performance of multiprocessors....

متن کامل

False Sharing ans Spatial Locality in Multiprocessor Caches

The performance of the data cache in sharedmemory multiprocessors has been shown to be different from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can significantly limit the performance of multiprocessors...

متن کامل

Design and Evaluation of a Subblock Cache Coherence Protocol for Bus-Based Multiprocessors

Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory architecture that serves all applications well is not easy. However, because tolerating or reducing memory latency is a priority in e ective parallel processing, it is important to explore new techniques to reduce memory tra c. In this paper, we describe a snoopy cache coherence protocol that uses a la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995